Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 52(D1): D164-D173, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37930866

RESUMO

Plasmids are mobile genetic elements found in many clades of Archaea and Bacteria. They drive horizontal gene transfer, impacting ecological and evolutionary processes within microbial communities, and hold substantial importance in human health and biotechnology. To support plasmid research and provide scientists with data of an unprecedented diversity of plasmid sequences, we introduce the IMG/PR database, a new resource encompassing 699 973 plasmid sequences derived from genomes, metagenomes and metatranscriptomes. IMG/PR is the first database to provide data of plasmid that were systematically identified from diverse microbiome samples. IMG/PR plasmids are associated with rich metadata that includes geographical and ecosystem information, host taxonomy, similarity to other plasmids, functional annotation, presence of genes involved in conjugation and antibiotic resistance. The database offers diverse methods for exploring its extensive plasmid collection, enabling users to navigate plasmids through metadata-centric queries, plasmid comparisons and BLAST searches. The web interface for IMG/PR is accessible at https://img.jgi.doe.gov/pr. Plasmid metadata and sequences can be downloaded from https://genome.jgi.doe.gov/portal/IMG_PR.


Assuntos
Metagenoma , Microbiota , Humanos , Metadados , Software , Bases de Dados Genéticas , Plasmídeos/genética
2.
Nature ; 622(7983): 594-602, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37821698

RESUMO

Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities1,2. Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database3. Using massively parallel graph-based clustering, we group these proteins into 106,198 novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical and gene neighbourhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matter.


Assuntos
Metagenoma , Metagenômica , Microbiologia , Proteínas , Análise por Conglomerados , Metagenoma/genética , Metagenômica/métodos , Proteínas/química , Proteínas/classificação , Proteínas/genética , Bases de Dados de Proteínas , Conformação Proteica
3.
Nat Biotechnol ; 2023 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-37735266

RESUMO

Identifying and characterizing mobile genetic elements in sequencing data is essential for understanding their diversity, ecology, biotechnological applications and impact on public health. Here we introduce geNomad, a classification and annotation framework that combines information from gene content and a deep neural network to identify sequences of plasmids and viruses. geNomad uses a dataset of more than 200,000 marker protein profiles to provide functional gene annotation and taxonomic assignment of viral genomes. Using a conditional random field model, geNomad also detects proviruses integrated into host genomes with high precision. In benchmarks, geNomad achieved high classification performance for diverse plasmids and viruses (Matthews correlation coefficient of 77.8% and 95.3%, respectively), substantially outperforming other tools. Leveraging geNomad's speed and scalability, we processed over 2.7 trillion base pairs of sequencing data, leading to the discovery of millions of viruses and plasmids that are available through the IMG/VR and IMG/PR databases. geNomad is available at https://portal.nersc.gov/genomad .

4.
Microbiol Spectr ; 11(4): e0020023, 2023 08 17.
Artigo em Inglês | MEDLINE | ID: mdl-37310219

RESUMO

Petabases of environmental metagenomic data are publicly available, presenting an opportunity to characterize complex environments and discover novel lineages of life. Metagenome coassembly, in which many metagenomic samples from an environment are simultaneously analyzed to infer the underlying genomes' sequences, is an essential tool for achieving this goal. We applied MetaHipMer2, a distributed metagenome assembler that runs on supercomputing clusters, to coassemble 3.4 terabases (Tbp) of metagenome data from a tropical soil in the Luquillo Experimental Forest (LEF), Puerto Rico. The resulting coassembly yielded 39 high-quality (>90% complete, <5% contaminated, with predicted 23S, 16S, and 5S rRNA genes and ≥18 tRNAs) metagenome-assembled genomes (MAGs), including two from the candidate phylum Eremiobacterota. Another 268 medium-quality (≥50% complete, <10% contaminated) MAGs were extracted, including the candidate phyla Dependentiae, Dormibacterota, and Methylomirabilota. In total, 307 medium- or higher-quality MAGs were assigned to 23 phyla, compared to 294 MAGs assigned to nine phyla in the same samples individually assembled. The low-quality (<50% complete, <10% contaminated) MAGs from the coassembly revealed a 49% complete rare biosphere microbe from the candidate phylum FCPU426 among other low-abundance microbes, an 81% complete fungal genome from the phylum Ascomycota, and 30 partial eukaryotic MAGs with ≥10% completeness, possibly representing protist lineages. A total of 22,254 viruses, many of them low abundance, were identified. Estimation of metagenome coverage and diversity indicates that we may have characterized ≥87.5% of the sequence diversity in this humid tropical soil and indicates the value of future terabase-scale sequencing and coassembly of complex environments. IMPORTANCE Petabases of reads are being produced by environmental metagenome sequencing. An essential step in analyzing these data is metagenome assembly, the computational reconstruction of genome sequences from microbial communities. "Coassembly" of metagenomic sequence data, in which multiple samples are assembled together, enables more complete detection of microbial genomes in an environment than "multiassembly," in which samples are assembled individually. To demonstrate the potential for coassembling terabases of metagenome data to drive biological discovery, we applied MetaHipMer2, a distributed metagenome assembler that runs on supercomputing clusters, to coassemble 3.4 Tbp of reads from a humid tropical soil environment. The resulting coassembly, its functional annotation, and analysis are presented here. The coassembly yielded more, and phylogenetically more diverse, microbial, eukaryotic, and viral genomes than the multiassembly of the same data. Our resource may facilitate the discovery of novel microbial biology in tropical soils and demonstrates the value of terabase-scale metagenome sequencing.


Assuntos
Microbiota , Solo , Microbiota/genética , Bactérias/genética , Metagenoma , Genoma Viral , Metagenômica/métodos
5.
PLoS Biol ; 21(4): e3002083, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-37083735

RESUMO

The extraordinary diversity of viruses infecting bacteria and archaea is now primarily studied through metagenomics. While metagenomes enable high-throughput exploration of the viral sequence space, metagenome-derived sequences lack key information compared to isolated viruses, in particular host association. Different computational approaches are available to predict the host(s) of uncultivated viruses based on their genome sequences, but thus far individual approaches are limited either in precision or in recall, i.e., for a number of viruses they yield erroneous predictions or no prediction at all. Here, we describe iPHoP, a two-step framework that integrates multiple methods to reliably predict host taxonomy at the genus rank for a broad range of viruses infecting bacteria and archaea, while retaining a low false discovery rate. Based on a large dataset of metagenome-derived virus genomes from the IMG/VR database, we illustrate how iPHoP can provide extensive host prediction and guide further characterization of uncultivated viruses.


Assuntos
Archaea , Vírus , Archaea/genética , Metagenoma/genética , Vírus/genética , Bactérias/genética , Metagenômica/métodos , Aprendizado de Máquina , Genoma Viral/genética
6.
Cell ; 186(3): 646-661.e4, 2023 02 02.
Artigo em Inglês | MEDLINE | ID: mdl-36696902

RESUMO

Viroids and viroid-like covalently closed circular (ccc) RNAs are minimal replicators that typically encode no proteins and hijack cellular enzymes for replication. The extent and diversity of viroid-like agents are poorly understood. We developed a computational pipeline to identify viroid-like cccRNAs and applied it to 5,131 metatranscriptomes and 1,344 plant transcriptomes. The search yielded 11,378 viroid-like cccRNAs spanning 4,409 species-level clusters, a 5-fold increase compared to the previously identified viroid-like elements. Within this diverse collection, we discovered numerous putative viroids, satellite RNAs, retrozymes, and ribozy-like viruses. Diverse ribozyme combinations and unusual ribozymes within the cccRNAs were identified. Self-cleaving ribozymes were identified in ambiviruses, some mito-like viruses and capsid-encoding satellite virus-like cccRNAs. The broad presence of viroid-like cccRNAs in diverse transcriptomes and ecosystems implies that their host range is far broader than currently known, and matches to CRISPR spacers suggest that some cccRNAs replicate in prokaryotes.


Assuntos
RNA Catalítico , Viroides , RNA Circular/metabolismo , Viroides/genética , Viroides/metabolismo , RNA Catalítico/genética , RNA Viral/genética , RNA Viral/metabolismo , Ecossistema , Doenças das Plantas
7.
Nucleic Acids Res ; 51(D1): D733-D743, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36399502

RESUMO

Viruses are widely recognized as critical members of all microbiomes. Metagenomics enables large-scale exploration of the global virosphere, progressively revealing the extensive genomic diversity of viruses on Earth and highlighting the myriad of ways by which viruses impact biological processes. IMG/VR provides access to the largest collection of viral sequences obtained from (meta)genomes, along with functional annotation and rich metadata. A web interface enables users to efficiently browse and search viruses based on genome features and/or sequence similarity. Here, we present the fourth version of IMG/VR, composed of >15 million virus genomes and genome fragments, a ≈6-fold increase in size compared to the previous version. These clustered into 8.7 million viral operational taxonomic units, including 231 408 with at least one high-quality representative. Viral sequences in IMG/VR are now systematically identified from genomes, metagenomes, and metatranscriptomes using a new detection approach (geNomad), and IMG standard annotation are complemented with genome quality estimation using CheckV, taxonomic classification reflecting the latest taxonomic standards, and microbial host taxonomy prediction. IMG/VR v4 is available at https://img.jgi.doe.gov/vr, and the underlying data are available to download at https://genome.jgi.doe.gov/portal/IMG_VR.


Assuntos
Bases de Dados Genéticas , Genoma Viral , Metadados , Metagenômica , Software
8.
bioRxiv ; 2023 Dec 19.
Artigo em Inglês | MEDLINE | ID: mdl-38187747

RESUMO

The majority of bacteriophage diversity remains uncharacterised, and new intriguing mechanisms of their biology are being continually described. Members of some phage lineages, such as the Crassvirales, repurpose stop codons to encode an amino acid by using alternate genetic codes. Here, we investigated the prevalence of stop codon reassignment in phage genomes and subsequent impacts on functional annotation. We predicted 76 genomes within INPHARED and 712 vOTUs from the Unified Human Gut Virome catalogue (UHGV) that repurpose a stop codon to encode an amino acid. We re-annotated these sequences with modified versions of Pharokka and Prokka, called Pharokka-gv and Prokka-gv, to automatically predict stop codon reassignment prior to annotation. Both tools significantly improved the quality of annotations, with Pharokka-gv performing best. For sequences predicted to repurpose TAG to glutamine (translation table 15), Pharokka-gv increased the median gene length (median of per genome medians) from 287 to 481 bp for UHGV sequences (67.8% increase) and from 318 to 550 bp for INPHARED sequences (72.9% increase). The re-annotation increased mean coding density from 66.8% to 90.0%, and from 69.0% to 89.8% for UHGV and INPHARED sequences. Furthermore, the proportion of genes that could be assigned functional annotation increased, including an increase in the number of major capsid proteins that could be identified. We propose that automatic prediction of stop codon reassignment before annotation is beneficial to downstream viral genomic and metagenomic analyses.

9.
Cell ; 185(21): 4023-4037.e18, 2022 10 13.
Artigo em Inglês | MEDLINE | ID: mdl-36174579

RESUMO

High-throughput RNA sequencing offers broad opportunities to explore the Earth RNA virome. Mining 5,150 diverse metatranscriptomes uncovered >2.5 million RNA virus contigs. Analysis of >330,000 RNA-dependent RNA polymerases (RdRPs) shows that this expansion corresponds to a 5-fold increase of the known RNA virus diversity. Gene content analysis revealed multiple protein domains previously not found in RNA viruses and implicated in virus-host interactions. Extended RdRP phylogeny supports the monophyly of the five established phyla and reveals two putative additional bacteriophage phyla and numerous putative additional classes and orders. The dramatically expanded phylum Lenarviricota, consisting of bacterial and related eukaryotic viruses, now accounts for a third of the RNA virome. Identification of CRISPR spacer matches and bacteriolytic proteins suggests that subsets of picobirnaviruses and partitiviruses, previously associated with eukaryotes, infect prokaryotic hosts.


Assuntos
Bacteriófagos , Vírus de RNA , Bacteriófagos/genética , RNA Polimerases Dirigidas por DNA/genética , Genoma Viral , Filogenia , RNA , Vírus de RNA/genética , RNA Polimerase Dependente de RNA/genética , Viroma
10.
Biotechnol Biofuels Bioprod ; 15(1): 57, 2022 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-35596177

RESUMO

BACKGROUND: The need to mitigate and substitute the use of fossil fuels as the main energy matrix has led to the study and development of biofuels as an alternative. Second-generation (2G) ethanol arises as one biofuel with great potential, due to not only maintaining food security, but also as a product from economically interesting crops such as energy-cane. One of the main challenges of 2G ethanol is the inefficient uptake of pentose sugars by industrial yeast Saccharomyces cerevisiae, the main organism used for ethanol production. Understanding the main drivers for xylose assimilation and identify novel and efficient transporters is a key step to make the 2G process economically viable. RESULTS: By implementing a strategy of searching for present motifs that may be responsible for xylose transport and past adaptations of sugar transporters in xylose fermenting species, we obtained a classifying model which was successfully used to select four different candidate transporters for evaluation in the S. cerevisiae hxt-null strain, EBY.VW4000, harbouring the xylose consumption pathway. Yeast cells expressing the transporters SpX, SpH and SpG showed a superior uptake performance in xylose compared to traditional literature control Gxf1. CONCLUSIONS: Modelling xylose transport with the small data available for yeast and bacteria proved a challenge that was overcome through different statistical strategies. Through this strategy, we present four novel xylose transporters which expands the repertoire of candidates targeting yeast genetic engineering for industrial fermentation. The repeated use of the model for characterizing new transporters will be useful both into finding the best candidates for industrial utilization and to increase the model's predictive capabilities.

11.
mBio ; 12(6): e0322121, 2021 12 21.
Artigo em Inglês | MEDLINE | ID: mdl-34903049

RESUMO

The routes of uptake and efflux should be considered when developing new drugs so that they can effectively address their intracellular targets. As a general rule, drugs appear to enter cells via protein carriers that normally carry nutrients or metabolites. A previously developed pipeline that searched for drug transporters using Saccharomyces cerevisiae mutants carrying single-gene deletions identified import routes for most compounds tested. However, due to the redundancy of transporter functions, we propose that this methodology can be improved by utilizing double mutant strains in both low- and high-throughput screens. We constructed a library of over 14,000 strains harboring double deletions of genes encoding 122 nonessential plasma membrane transporters and performed low- and high-throughput screens identifying possible drug import routes for 23 compounds. In addition, the high-throughput assay enabled the identification of putative efflux routes for 21 compounds. Focusing on azole antifungals, we were able to identify the involvement of the myo-inositol transporter, Itr1p, in the uptake of these molecules and to confirm the role of Pdr5p in their export. IMPORTANCE Our library of double transporter deletion strains is a powerful tool for rapid identification of potential drug import and export routes, which can aid in determining the chemical groups necessary for transport via specific carriers. This information may be translated into a better design of drugs for optimal absorption by target tissues and the development of drugs whose utility is less likely to be compromised by the selection of resistant mutants.


Assuntos
Transportadores de Cassetes de Ligação de ATP/genética , Deleção de Genes , Proteínas de Transporte de Monossacarídeos/genética , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Xenobióticos/metabolismo , Transportadores de Cassetes de Ligação de ATP/metabolismo , Antifúngicos/metabolismo , Antifúngicos/farmacologia , Transporte Biológico , Biblioteca Gênica , Ensaios de Triagem em Larga Escala , Proteínas de Transporte de Monossacarídeos/metabolismo , Saccharomyces cerevisiae/efeitos dos fármacos , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Xenobióticos/farmacologia
12.
ACS Infect Dis ; 7(4): 759-776, 2021 04 09.
Artigo em Inglês | MEDLINE | ID: mdl-33689276

RESUMO

Antimalarial drugs with novel modes of action and wide therapeutic potential are needed to pave the way for malaria eradication. Violacein is a natural compound known for its biological activity against cancer cells and several pathogens, including the malaria parasite, Plasmodium falciparum (Pf). Herein, using chemical genomic profiling (CGP), we found that violacein affects protein homeostasis. Mechanistically, violacein binds Pf chaperones, PfHsp90 and PfHsp70-1, compromising the latter's ATPase and chaperone activities. Additionally, violacein-treated parasites exhibited increased protein unfolding and proteasomal degradation. The uncoupling of the parasite stress response reflects the multistage growth inhibitory effect promoted by violacein. Despite evidence of proteotoxic stress, violacein did not inhibit global protein synthesis via UPR activation-a process that is highly dependent on chaperones, in agreement with the notion of a violacein-induced proteostasis collapse. Our data highlight the importance of a functioning chaperone-proteasome system for parasite development and differentiation. Thus, a violacein-like small molecule might provide a good scaffold for development of a novel probe for examining the molecular chaperone network and/or antiplasmodial drug design.


Assuntos
Antimaláricos , Antimaláricos/farmacologia , Indóis/farmacologia , Chaperonas Moleculares , Plasmodium falciparum
13.
Nat Biotechnol ; 39(5): 578-585, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-33349699

RESUMO

Millions of new viral sequences have been identified from metagenomes, but the quality and completeness of these sequences vary considerably. Here we present CheckV, an automated pipeline for identifying closed viral genomes, estimating the completeness of genome fragments and removing flanking host regions from integrated proviruses. CheckV estimates completeness by comparing sequences with a large database of complete viral genomes, including 76,262 identified from a systematic search of publicly available metagenomes, metatranscriptomes and metaviromes. After validation on mock datasets and comparison to existing methods, we applied CheckV to large and diverse collections of metagenome-assembled viral sequences, including IMG/VR and the Global Ocean Virome. This revealed 44,652 high-quality viral genomes (that is, >90% complete), although the vast majority of sequences were small fragments, which highlights the challenge of assembling viral genomes from short-read metagenomes. Additionally, we found that removal of host contamination substantially improved the accurate identification of auxiliary metabolic genes and interpretation of viral-encoded functions.


Assuntos
Genoma Viral/genética , Metagenoma/genética , Metagenômica , Software , Anotação de Sequência Molecular
14.
Sci Data ; 6(1): 140, 2019 07 31.
Artigo em Inglês | MEDLINE | ID: mdl-31366912

RESUMO

The rocky, seasonally-dry and nutrient-impoverished soils of the Brazilian campos rupestres impose severe growth-limiting conditions on plants. Species of a dominant plant family, Velloziaceae, are highly specialized to low-nutrient conditions and seasonal water availability of this environment, where phosphorus (P) is the key limiting nutrient. Despite plant-microbe associations playing critical roles in stressful ecosystems, the contribution of these interactions in the campos rupestres remains poorly studied. Here we present the first microbiome data of Velloziaceae spp. thriving in contrasting substrates of campos rupestres. We assessed the microbiomes of Vellozia epidendroides, which occupies shallow patches of soil, and Barbacenia macrantha, growing on exposed rocks. The prokaryotic and fungal profiles were assessed by rRNA barcode sequencing of epiphytic and endophytic compartments of roots, stems, leaves and surrounding soil/rocks. We also generated root and substrate (rock/soil)-associated metagenomes of each plant species. We foresee that these data will contribute to decipher how the microbiome contributes to plant functioning in the campos rupestres, and to unravel new strategies for improved crop productivity in stressful environments.


Assuntos
Magnoliopsida/microbiologia , Microbiota , Fósforo/química , Microbiologia do Solo , Solo/química , Bactérias/classificação , Biodiversidade , Brasil , Fungos/classificação , Metagenoma , Metiltransferases/genética , Análise de Sequência de DNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA